Skeleton-based action recognition via spatial and temporal transformer networks
نویسندگان
چکیده
Skeleton-based Human Activity Recognition has achieved great interest in recent years as skeleton data demonstrated being robust to illumination changes, body scales, dynamic camera views, and complex background. In particular, Spatial-Temporal Graph Convolutional Networks (ST-GCN) be effective learning both spatial temporal dependencies on non-Euclidean such graphs. Nevertheless, an encoding of the latent information underlying 3D is still open problem, especially when it comes extracting from joint motion patterns their correlations. this work, we propose a novel Transformer network (ST-TR) which models between joints using self-attention operator. our ST-TR model, Spatial Self-Attention module (SSA) used understand intra-frame interactions different parts, Temporal (TSA) model inter-frame The two are combined two-stream network, whose performance evaluated three large-scale datasets, NTU-RGB+D 60, 120, Kinetics Skeleton 400, consistently improving backbone results. Compared with methods that use same input data, proposed achieves state-of-the-art all datasets joints' coordinates input, results on-par adding bones information.
منابع مشابه
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition
Dynamics of human body skeletons convey significant information for human action recognition. Conventional approaches for modeling skeletons usually rely on hand-crafted parts or traversal rules, thus resulting in limited expressive power and difficulties of generalization. In this work, we propose a novel model of dynamic skeletons called SpatialTemporal Graph Convolutional Networks (ST-GCN), ...
متن کاملSpatio-Temporal Graph Convolution for Skeleton Based Action Recognition
Variations of human body skeletons may be considered as dynamic graphs, which are generic data representation for numerous real-world applications. In this paper, we propose a spatio-temporal graph convolution (STGC) approach for assembling the successes of local convolutional filtering and sequence learning ability of autoregressive moving average. To encode dynamic graphs, the constructed mul...
متن کاملTied Spatial Transformer Networks for Character Recognition
This paper reports a new approach applied to convolutional neural networks (CNNs), which uses spatial transformer networks (STNs). It consists in training an architecture which combines a localization CNN and a classification CNN, for which most of the weights are tied, which from here on we will name Tied Spatial Transformer Networks (TSTNs). The localization CNN is used for predicting the bes...
متن کاملSpatial Transformer Networks
Convolutional Neural Networks define an exceptionally powerful class of models, but are still limited by the lack of ability to be spatially invariant to the input data in a computationally and parameter efficient manner. In this work we introduce a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within the network. This differentiable mod...
متن کاملAction Recognition from Skeleton Data Via Analogical Generalization
Human action recognition remains a difficult problem for AI. Traditional machine learning techniques have had some success, but have two disadvantages. First, these models are typically black boxes whose internal models are not inspectable and whose results are not explainable. Second, typically massive amounts of data are needed to achieve good recognition performance. This paper describes a n...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computer Vision and Image Understanding
سال: 2021
ISSN: ['1090-235X', '1077-3142']
DOI: https://doi.org/10.1016/j.cviu.2021.103219